Capturing Global Informativeness in Open Domain Keyphrase Extraction
نویسندگان
چکیده
Open-domain KeyPhrase Extraction (KPE) aims to extract keyphrases from documents without domain or quality restrictions, e.g., web pages with variant domains and qualities. Recently, neural methods have shown promising results in many KPE tasks due their powerful capacity for modeling contextual semantics of the given documents. However, we empirically show that most prefer good phraseness, such as short entity-style n-grams, instead globally informative open-domain This paper presents JointKPE, an architecture built on pre-trained language models, which can capture both local phraseness global informativeness when extracting keyphrases. JointKPE learns rank by estimating entire document is jointly trained keyphrase chunking task guarantee candidates. Experiments two large datasets diverse domains, OpenKP KP20k, demonstrate effectiveness different variants scenarios. Further analyses reveal significant advantages predicting long non-entity keyphrases, are challenging previous methods. Our code publicly available at https://github.com/thunlp/BERT-KPE.
منابع مشابه
Domain-Specific Keyphrase Extraction
Keyphrases are an important means of document summarization, clustering, and topic search. Only a small minority of documents have author-assigned keyphrases, and manually assigning keyphrases to existing documents is very laborious. Therefore it is highly desirable to automate the keyphrase extraction process. This paper shows that a simple procedure for keyphrase extraction based on the naive...
متن کاملDomain-speciic Keyphrase Extraction
Keyphrases are an important means of document summarization, clustering, and topic search. Only a small minority of documents have author-assigned keyphrases, and manually assigning keyphrases to existing documents is very laborious. Therefore it is highly desirable to automate the keyphrase extraction process. This paper shows that a simple procedure for keyphrase extraction based on the naive...
متن کاملDomain - Speci c Keyphrase Extraction
Keyphrases are an important means of document summarization, clustering, and topic search. Only a small minority of documents have author-assigned keyphrases, and manually assigning keyphrases to existing documents is very laborious. Therefore it is highly desirable to automate the keyphrase extraction process. This paper shows that a simple procedure for keyphrase extraction based on the naive...
متن کاملA New Domain Independent Keyphrase Extraction System
In this paper we present a keyphrase extraction system that can extract potential phrases from a single document in an unsupervised, domain-independent way. We extract word n-grams from input document. We incorporate linguistic knowledge (i.e., part-of-speech tags), and statistical information (i.e., frequency, position, lifespan) of each n-gram in defining candidate phrases and their respectiv...
متن کاملPke: an Open Source Python-based Keyphrase Extraction Toolkit
We describe pke, an open source python-based keyphrase extraction toolkit. It provides an end-to-end keyphrase extraction pipeline in which each component can be easily modified or extented to develop new approaches. pke also allows for easy benchmarking of state-of-the-art keyphrase extraction approaches, and ships with supervised models trained on the SemEval-2010 dataset (Kim et al., 2010).
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture Notes in Computer Science
سال: 2021
ISSN: ['1611-3349', '0302-9743']
DOI: https://doi.org/10.1007/978-3-030-88483-3_21